Corpus: snd_wikipedia_2012

Other corpora

3.7.3 Distribution of the string similarity for different rank ranges

Distribution of the Levenshtein distance for words of rank

String similarity for top-1.000 words
Distance Percentage of words
0 2.8986
1 15.9420
2 81.1594
String similarity for top-10.000 words
Distance Percentage of words
0 1.4143
1 11.7454
2 86.8403
String similarity for top-100.000 words
Distance Percentage of words
0 1.0109
1 12.4279
2 86.5613
String similarity for top-1.000.000 words
Distance Percentage of words
0 1.0109
1 12.4279
2 86.5613
101 msec needed at 2017-10-24 01:00